Goto

Collaborating Authors

 visual grounding




SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal Fusion Ming Dai 1, Lingfeng Y ang

Neural Information Processing Systems

Visual grounding is a common vision task that involves grounding descriptive sentences to the corresponding regions of an image. Most existing methods use independent image-text encoding and apply complex hand-crafted modules or encoder-decoder architectures for modal interaction and query reasoning.



CityRefer Datasheet We follow the guidelines of the datasheets for datasets [ 1 ] to explain the composition, collection, recommended use case, and other details of the CityRefer dataset

Neural Information Processing Systems

For what purpose was the dataset created? We created this CityRefer dataset to facilitate research toward city-scale 3D visual grounding. Who created the dataset (e.g., which team, research group) and on behalf of which entity (e.g., Who funded the creation of the dataset? What do the instances that comprise the dataset represent? CityRefer contains descriptions for 3D visual grounding on large-scale point cloud data.